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(54) TlUe: APPARATUS AND METHOD FOR COMPRESSING BINARIZED IMAGES 
(57) Abstract 



An apparatus and method for compressing binarized images (90) 
comprising receiving a binarized image (70) and generating a first 
sequence of first code symbols (80) representing the binarized image 
wherein at least one row of the image is represented in run-length 
encoded format and encoding a portion of the first sequence of code 
symbols using a preliminary encoding scheme, thereby to provide a first 
portion of a second sequence of code symbols, and, while encoding, 
accumulating the frequency of at least some of the first code symbols 
thus far encoded (100) and generating an additional portion of the second 
sequence using a modified version of the code scheme such that at 
least one subsequent code symbol in the first sequence with a large 
accumulated frequency is encoded more compactly in the second portion 
than at least one subsequent code symbol in the first sequence with a 
small accumulated frequency. 
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APPARATUS AND METHOD FOR COMPRESSING BINARIZED IMAGES 
FIELD OF THE INVENTION 

The present invention relates to methods for 
compressing binarized images, generally. 

BACKGROUND OF THE INVENTION 

Arithmetic coding is described in: 

Witten, I. H et al, "Arithmetic coding for data 
compression". Computing Practices, Coiranunications of the 
ACM, Jun 1987, Vol. 30(6); and 

"Arithmetic coding and statistical modeling". 
Dr. Dobb's Journal, Feb. 1991, pp. 16 - 29. 

The MR decoding scheme is described in CCITT 
Recommendation T.4 and T.6 for Groups 3 and 4. 

A conventional binarizing technique is de- 
scribed in Foley, J. et al, computer Graphics ; Princi- 
ples and practice . 2nd Ed., Section 13.1.2, pages 568 - 
573. 

The disclosures of all of the above piiblica- 
tions are hereby incorporated by reference. 
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SUMMARY OF THE INVENTION 

The present invention seeks to provide an 
improved image manipulation system. 

There is thus provided in accordance with a 
preferred embodiment of the present invention a method 
for compressing binarized images including receiving a 
binarized image and generating a first sequence of first 
code symbols representing the binarized image wherein at 
least one row of the image is represented in run-length 
encoded format, and encoding a portion of the first 
sequence of code symbols using a preliiainary encoding 
scheme, thereby to provide a first portion of a second 
sequence of code symbols, and, while encoding, accumu- 
lating the frequency of at least some of the first code 
symbols thus far encoded and generating an additional 
portion of the second sequence using a modified version 
of the code scheme such that at least one subsequent 
code symbol in the first sequence with a large accumulat- 
ed frequency is encoded more compactly in the second 
portion than at least one subsequent code symbol in the 
first sequence with a small accumulated frequency. 

Further in accordance with a preferred embodi- 
ment of the present invention, a modified Huffman coding 
scheme is employed to generate the first sequence of 
first code symbols. 

In accordance with another preferred embodiment 
of the present invention, there is provided a method for 
compressing binarized images including receiving a binar- 
ized image and generating a first sequence of first code 
symbols representing the binarized image including a 
representation of one row of the binarized image and a 
representation of differences between at least one subse- 
quent row and at least one previous row, and encoding a 
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portion of the first sequence of code symbols using a 
preliminary encoding scheme, thereby to provide a first 
portion of a second sequence of code symbols, and, while 
encoding, accumulating the frequency of at least some of 
the first code symbols thus far encoded and generating an 
additional portion of the second sequence using a modi- 
fied version of the code scheme such that at least one 
subsequent code symbol in the first sequence with a large 
accumulated frequency is encoded more compactly in the 
second portion than at least one subsequent code symbol 
in the first sequence with a small accximulated frequency. 

Further in accordance with a preferred embodi- 
ment of the present invention, the encoding scheme used 
to encode the first sequence of code symbols is continu- 
ally modified such that code symbols in the first se- 
quence with a large accumulated frequency are encoded 
more compactly in the second portion than subsequent code 
symbols in the first sequence with a small accumulated 
frequency. 

Still further in accordance with a preferred 
embodiment of the present invention, a modif ied-read 
coding scheme is employed to generate the first sequence 
of first code symbols. 

Further in accordance with a preferred embodi- 
ment of the present invention, a modified modif ied-read 
coding scheme is employed to generate the first sequence 
of first code symbols. 

Still further in accordance with a preferred 
embodiment of the present invention, the method also 
includes binarizing a discrete level image, thereby to 
provide the binarized image. 

Additionally in accordance with a preferred 
embodiment of the present invention, the method also 
includes binarizing a continuous level image, thereby to 
provide the binarized image. 

Still further in accordance with a preferred 
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embodiment of the present invention, arithmetic coding is 
employed to translate the accumulated frequency of at 
least some of the first code symbols into second code 
symbols . 

There is also provided, in accordance with a 
preferred embodiment of the present invention, apparatus 
for compressing binarized images including a run-length 
encoder operative to receive a binarized image and to 
generate a first sequence of first code symbols repre- 
senting the binarized image wherein at least one row of 
the image is represented in run-length encoded format, 
and an adaptive encoder operative to encode a portion of 
the first sequence of code symbols using a preliminary 
encoding scheme, thereby to provide a first portion of a 
second sequence of code symbols, and, while encoding, to 
accumulate the frequency of at least some of the first 
code symbols thus far encoded and to generate an addi- 
tional portion of the second sequence using a modified 
version of the code scheme such that at least one subse- 
quent code symbol in the first sequence with a large 
accxamulated frequency is encoded more compactly in the 
second portion than at least one subsequent code symbol 
in the first sequence with a small accumulated frequency. 

There is further provided, in accordance with a 
preferred embodiment of the present invention, apparatus 
for compressing binarized images including a binarized 
image compressor operative to receive a binarized image 
and to generate a first sequence of first code symbols 
representing the binarized image, the first sequence 
including a representation of one row of the binarized 
image and a representation of differences between at 
least one subsequent row and at least one previous row, 
and an adaptive encoder operative to encode a portion of 
the first sequence of code symbols using a preliminary 
encoding scheme , thereby to provide a first portion of a 
second sequence of code symbols, and, while encoding, to 
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accumulate the frequency of at least some of the first 
code symbols thus far encoded and to generate an addi- 
tional portion of the second sequence using a modified 
version of the code scheme such that at least one subse- 
quent code symbol in the first sequence with a large 
accumulated frequency is encoded more compactly in the 
second portion than at least one subsequent code symbol 
in the first sequence with a small accumulated frequency. 

Further in accordance with a preferred embodi- 
ment of the present invention, the binarized image com- 
pressor employs a modif ied-read coding scheme to generate 
the first sequence of first code symbols. 

Still further in accordance with a preferred 
embodiment of the present invention, the binarized image 
compressor employs a modified modif ied-read coding scheme 
to generate the first sequence of first code symbols. 

Additionally in accordance with a preferred 
embodiment of the present invention , the adaptive encoder 
employs arithmetic coding to translate the accximulated 
frequency of at least some of the first code symbols into 
second code symbols. 

Still further in accordance with a preferred 
embodiment of the present invention, the encoding scheme 
used to encode the first sequence of code symbols is 
continually modified such that code symbols in the first 
sequence with a large accumulated frequency are encoded 
more compactly in the second portion than subsequent code 
symbols in the first sequence with a small accumulated 
frequency. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention will be understood and 
appreciated from the following detailed description, 
taken in conjunction with the drawings in which: 

Fig. 1 is a simplified block diagram of an 
image manipulation system constructed and operative in 
accordance with a preferred embodiment of the present 

invention , and 

Fig. 2 is a simplified flowchart illustrating a 
preferred mode of operation in which the MR code element 
frequency accumulation unit of Fig. 1 processes a single 
MR code element in a sequence. 

Attached herewith are the following appen- 
dices which aid in the understanding and appreciation of 
one preferred embodiment of the invention shown and 
described herein: 

Appendix A is a computer listing of a preferred 
software embodiment of the MR coding, arithmetic coding 
and MR code element frequency accumulation units of Fig. 
1 , and 

Appendix B is a computer listing of a preferred 
software embodiment of the arithmetic decoding, MR code 
frequency accumulation and MR decoding units of Fig. 1. 
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DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

Reference is now made to Fig. 1 which is a 
simplified block diagram of an image manipulation system 
constructed and operative in accordance with a preferred 
embodiment of the present invention. 

As shown, a digital representation of an image 
is provided from any suitable source, such as a scanner 
10 which scans a substrate such as a continuous level 
photograph 20, a digital camera 30, a fax machine 40, an 
image creation workstation 50 such as a Macintosh 
equipped with the Adobe Photoshop software package, or a 
storage medium such as a hard disk 60. The digital repre- 
sentation of the image may be either a continuous level 
image or a discrete level image such as a document or 
other black and white image. 

If the digital representation of the image is 
not binary, the digital representation is binarized, as 
indicated in Fig. 1 by image binarization block 70, using 
any conventional binarizing technique such as those 
described in Foley, J. et al, computer Graphics : princi- 
ples and practice . 2nd Ed., Section 13.1.2, pages 568 - 
573. 

The binarized image is then coded by MR coding 
unit 80, using the MR coding scheme described in CCITT 
Recoiranendation T.4 and T.6 for Groups 3 or 4. 

The MR coded binarized image generated by MR 
coding unit 80 then undergoes arithmetic coding in arith- 
metic coding unit 90. The arithmetic coding unit 90 
receives as input: 

a. the sequence of MR code elements which forms 
the MR coded binarized image and 

b. the estimated probability of each MR code 
element, which is provided by an MR code element frequen- 
cy accumulation unit 100. Initially, the estimated proba- 
bilities of all MR code elements are typically taken to 
be equal. However, as the MR code element sequence flows 
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into the MR code element frequency accumulation unit 100, 
the estimated probabilities change based on the number of 
times each MR code element is encountered. 

The sequence of MR code elements typically 
includes code elements of 3 types: 

a. MR control type code elements; 

b. Black run length type code elements; and 

c. White run length type code elements. 

The frequency accxamulation unit 100 typically 
receives as input each MR code element and, associated 
therewith, an indication of the type of that MR code 
element. Typically, unit 100 computes the relative code 
element frequency for each code element within its own 
code element type. 

The arithmetic coding unit 90 may, if desired, 
be replaced by an entropy encoder or an adaptive Huffman 
encoder. If this is the case, then the arithmetic decod- 
ing unit 110, described below, is replaced by an entropy 
decoder or adaptive Huffman decoder, respectively. 

One software embodiment of arithmetic coding 
unit 90 is described in "Arithmetic coding and statisti- 
cal modeling". Dr. Dobb's Journal, Feb. 1991, pp. 16 - 
29. The above reference also provides a software embodi- 
ment of arithmetic decoding unit 110. 

An alternative implementation of MR code ele- 
ment frequency accumulation unit 100 is described below 
with reference to Fig. 2. 

The output of the arithmetic coding unit 90 is 
a very compact representation of the original image which 
is suitable, for example, for compact storage on any 
suitable optical or magnetic medium and/or for rapid 
facsimile transmission, 105, on conventional equipment 
which preferably has a error correction capability, such 
as the V32bis modem. 

The compact representation of the original 
image is decompressed after being transmitted or after 
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being retrieved from archival. To decompress the compact 
representation, the compressed data stream is fed to an 
arithmetic decoding unit 110 which replaces each arith- 
metically coded element with a corresponding MR code 
element according to the frequency of the arithmetically 
coded element. The frequency information is provided by 
an MR code element frequency accumulation unit 120 which 
is typically identical to unit 100. Initially, the esti- 
mated probabilities of all MR code elements are typically 
taken to be equal. However, as the MR code element se- 
quence flows into the MR code element frequency accumula- 
tion unit 120, the estimated probabilities change based 
on the number of times each MR code element is encoun- 
tered . 

The output of the arithmetic decoding unit 110 
is a sequence of MR code elements which is decoded by an 
MR decoding unit 130 using the MR decoding scheme de- 
scribed in CCITT Recommendation T.4 and T.6 for Groups 3 
or 4. 

The output of MR decoding unit 130 is a decom- 
pressed binarized image which is substantially identical 
to the binarized image generated by image binarization 
unit 70. Fig. 2 is a simplified flowchart illus- 

trating a preferred mode of operation in which either of 
the MR code element frequency accumulation units 100 or 
120 of Fig. 1 processes a single MR code element in a 
sequence of MR code elements. 

If (process 210) there is a decision to reset, 
i^e. to begin accvimulating frequencies from zero, then 
the method advances to stage 220. Otherwise, the method 
advances to stage 240. A reset is performed, for example, 
if a new image is to be processed whose characteristics 
are thought to differ significantly from the previous 
image processed. 

In process 220, a table is allocated for each 
of the three MR code element types. Th nximber of cells 
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in each table typically exceeds the niamber of code ele- 
ments of each type, by 1. The difference between the 
content of the i'th cell in the table and the (i+l)th 
cell in the table, also termed herein "the i'th 
interval", is indicative of the relative frequency of 
the i'th code element, within its code element type. 

Since there are 92 code elements of the White 
Run Length type and of the Black Run Length type, the 
tables for these two types each typically have 93 cells. 
Since there are 9 code elements of the MR Control type, 
the table for the MR Control type typically has 10 cells. 

PROCESS 230: The table contents are initialized 
by generating equal intervals such as, typically, inter- 
vals having a length of 1 . 

PROCESS 240: Input is received: A single MR 
code element from the MR code element sequence represent- 
ing the image, and, associated therewith, its MR code 
element type, is received as input. 

PROCESS 250: Unit 100 allows arithmetic coder 
90 to arithmetically code the current MR code element, by 
supplying the frequency intervals stored in the table 
corresponding to the current MR code element to the 
arithmetic coder 90. For example, if the MR code element 
is of the MR_control type, the intervals stored in the 
MR_control table are employed. 

Unit 120 allows the decoder 110 to arithmeti- 
cally decode the current MR code element, by supplying 
the seune information to decoder 110. 

PROCESS 260: The appropriate table is updated 
by incrementing by 1 the contents of each cell starting 
from the cell following the cell corresponding to the 
current code element. 

For example, if the fourth MR_control type code 
element is encountered, the contents of the fifth to 
ninth cells of the MR-control table are incremented by 1. 

Preferably, old frequency information is given 
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less weight than new frequency information. One implemen- 
tation of this rule is: 

PROCESS 270: For each type t, each time code 
elements of type t have been processed, divide the cell 
contents of the frequency interval table of type t, by a 
suitable number such as 2, Suitable values are: 256 
for MR control type, 2048 for black and white run length 
types . 

Appendix A is a computer listing in C language, 
of a preferred software embodiment of the MR coding, 
arithmetic coding and MR code element frequency acciimula- 
tion units of Fig . 1 . 

Appendix B is a computer listing in C language, 
of a preferred software embodiment of the arithmetic 
decoding, MR code element frequency accumulation and MR 
decoding units of Fig . 1 . 

The programs listed in Appendices A and B may 
be run on a conventional computer such as any UNIX com- 
puter • 

It is appreciated that the MR coding described 
hereinabove may, alternatively be replaced by MMR coding 
or other similar coding schemes. 

It is appreciated that the invention shown and 
described herein is suitable for compressing and decom- 
pressing any type of binarized image, such as binarized 
discrete level images or binarized continuous level 
images^ also termed herein "halftone images" • 

In certain applications , it may be desirable to 
use the compression methods shown and described herein to 
compress only a portion of a binarized image. For excun- 
ple, in medical imaging applications, the compression 
methods shown and described herein may be employed to 
generally losslessly compress the foreground of the 
medical image whereas the background of the medical image 
may be compressed using lossy techniques. 
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It is appreciated that the software components 
of the present invention may, if desired, be implemented 
in ROM (read-only memory) form. The software components 
may, generally, be implemented in hardware, if desired, 
using conventional techniques. 

It is appreciated that the particular embodi- 
ment described in the Appendices is intended only to 
provide an extremely detailed disclosure of the present 
invention and is not intended to be limiting. 

It is appreciated that various features of the 
invention which are, for clarity, described in the con- 
texts of separate embodiments may also be provided in 
combination in a single embodiment. Conversely, various 
features of the invention which are, for brevity, de- 
scribed in the context of a single embodiment may also be 
provided separately or in any suitable sxibcombination . 

It will be appreciated by persons skilled in 
the art that the present invention is not limited to what 
has been particularly shown and described hereinabove. 
Rather, the scope of the present invention is defined 
only by the claims that follow: 
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C:\ARIKVC0MPRESS\PTNTSRC\ACC:MP.C - Xnu Aug 25 09:03:04 1994 
ACCMP COMPRESSION UTILITY 



The following sources Implement tne suggested compression technique 
previously described. 

The agcmp program compresses a raw binary file (with no headers and with 
a known line length) to a compressed file on the dlslc 

FILES: 



agcmp.c - the main loop for compression. Converts the raw file to 
MR codes and passes them to the arithmetic coder. 



The following sources are common to both programs - agcmp and 
agexp (Decompression) and handle the statistical estimation 
(element frequency accumulation) and the arithmetic coding: 

amdi.c - statistical estimation. Based on a source from Dr. Dobbs 
Journal, February 1991, "Arithmatic Coding and Statistical 
Modeling* by Mark R. Neisoa but modified to fit compression 
of MR codes. 

acoder.c abltlo.c - implement the arithmetic coder, based on Dr. Dobbs 
Journal. 

COMPILATION: 



agcmp: cc agcmp.c amdi.c acoder.c abltlo.c 

FURTHER INFORMATION abOUt agcmp.C 



AUTHOR: Arik cordon 

INPUT: A rastered file (No headersD with 1728 binary pixels per line 
OUPUT: compressed file. 
USAGE: agcmp in rle out RLE 

Desc : This source opens a rastered binary file, converts It to codes 
according to MR standard, and passes the codes to the arithmatic 
coder. The compressed file Is constructed from a header (see agcmp.h) 
and the compressed entropy coded stream. 



#lnclude <stdlo.h> 
llnctude <stdllb.h> 
/Include <stnng.h> 
/Include <fcntLh> 
/include <memorY.h> 
/include <mailocli> 
/include <sys\tvpes.h> 
/include <sys\statii> 
/include <dos.h> 
/Include •acoder.h" 
/include "amodeLh* 
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^include -aDitio.n" 
#inc;ude -agcmp.n- 

static char •iastjinejn_prev_strtp; 

long agcmp(cnar •infiie, cnar -outfile); // returns size In bytes 

long add fiiecchar *ia Int out); 

long nnr_connpress strip(cnar bufill. int lines); 

void nnodified_READ(cnar *prev,char ♦cur, cnar *next, int lengtfi); 

void oneJine_modified^read(cnar *prev, cnar •curr, int length); 

void put~rl(int len, int color); 

void put^codeUnt ten, Int color); 

void put_EOL0; 

find nextdnt coiorjnt pos, cnar •linejnt len); 
void ~erase_singis_dotstcnar *prev, cnar *curr, char ♦next int len); 

maindnt argc, char ♦argvo) 
{ 

If (argc i » 3) { 

f printflstderr, "Vnusage: %s IMC file name C3 output file_name\n-, argvlOD; 
exitO); 

} 

prlntfrtotai bytes = %id\n-, agcmpcargvin, argvl2D); 

} 



long agcmptehar *infiie, char ♦outflle) // returns size In bytes 
{ 

long total_bYtes-OL; 
unsigned int i, j-O, k, file count- 
char ♦bufi; 

unsigned size in bytes; 
AC HEADER ag header; 
int fdl, fdtmp;" 

If ((fdi«openanfiIe, O RDONLY | O BINARY, S IREAD t S IWRITD) < 1) 
BigErrO, "cmr; can"^ Open"); 

/♦ INrriAUZE ARrrHMETlC CODER ♦/ 

Inltiallze^modelO; 

init mr modelO; 

Initiallzi.output bttstreamcoutflle, &ag_header slzeofCAC^HEAOER)); 
initlalize~arithmetic_encoderO; 

If ( (bufi - mallOCSTRIP SIZE^BYTES PER.UND) - - NUU) 

BigErro, -accmp: no mem"); 

If ( oastjlnejn^prev.strlp - malioc(PELS.PER_LlNB) - - NULU 
BIgEiTO, "aglrmpl: no mem"); 

memsetoast line In prev strip, 0, pels_PER_UNE); 
ag_header,numberjpf Jlriesjn.file - 0; 

/* Main loop for Compression •/ 

while ((file count-read(fdi, bufi, strip_size*bytes per unb) > - bytes.per.und { 
f prlntf(5tderr, "COMPRESSING STRIP #%d\r, J + +); 
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ag_neader.numDer of lines in file + « file_count/BYTCS_PER_LlNE; 
mr compress.striprdufl file~count/BYTES_PER_LlND; 
heapmino; 

} ■ 

f prlntf(stderr, '^nn; 

freedast line in prev strip); 

free(Dufi); 

ciosecfdi); 

^heapmlnO; 

/* Finish and close arltnmetic coding */ 
code_EOF0; 

f lush~arltnmetic_encoder( ); 

totaCbvtes - fiush_OLitput_bitstream(&ag_header, sizeof(AC_HEADER)); 
free3iTidLbufsO; 

return(total_tDVtes); 

} 

/* compress one strip (arbitrary size, defined in agcmp.n) */ 

long mr compress strip(char bufiO, int lines) 
{ 

char arravl3l[PELS_PER_UNEl; 
unsigned k, i, curjlnei2, off; 

// Ril first 2 lines in array. 

for Oc-0; k < min(2, lines); k+ +) 
for (I «0; I < PELS PER_LINE; 1 + +) 

arrayik+uin = ({bUfi[k*BYTES_PER_UNE+l/8I & (1 < < (7-a%8)))) !« 0); 

if Oines > 0) // There is at least 1 line to compress 
modlfied^READOsiULU aairaytlllOl, &arrayt2lt0l, PELS_PER_UNB; // First array compression 

/* convert packed bits to "1 bit per byte- format •/ 
while (curjine < lines) { 
memcpy(&arrayioiroi. &arraytlllOl 2 • PELS per une); 

for a -0; i <PELS PER_UNE; i + +) { 
off - cur line'* BYTES PER UNE + 1/8; 
if (bufilofS - - 0) { 

memset(&(array(2]II0, 0, 8); 

I+-7; 

continue; 

} 

if (bufi[offl - - 255) { 
memset(&(arrayi2iUD, 1, 8); 
I+-7; 

continue; 

arrayCUii - ((buflloffl & (1 < < (7-a%8)))) 1-0); 

} 

curjlne++; 

/♦ compress one line (given the previous line )*/ 
/♦ (we also provide the next line In case some filtering is 
desired) V 

modified READ(&arraylOUOl, MrraytHtOJ, &arrayl2U01. PELS_PER.UNE); 
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/* do last line •/ 
if (lines > 1) { 

memcpv(&arravt01lOl, aarraviiKOl 2 * pels_per.unE); 
modified READC&arravlOnoi aarravtlUOl, NULL, PELS_PER_LINE); 

} 

retumd); 

} 



void modlfled_READ(char *prev,char *cur, char •next, int length) 

^ intj; 
long i; 

curtOl - WHITE; // dont accept a Piacic pixel on tine beginning 

if (prev»-NULU { 
onejine_modified_readaastJineJn.prev.strip, cur, length); 

return; 

memcpvOast line ln_prev strip, cur, PELS_PER_UNB; 
one line modified read(prev, cur, length); 
} ■ " 

/* Here we actuallt translate the line to MR codes + Run-Lengths 

and pass the codes to the arithmetic coder •/ 
void one line modified read(char *prev, char ♦curr, int length) 

{ 

intaO, ai, a2, bi, b2, aO_color; 

aO - -1; aO^COlor « WHITE; 

// •curr » WHrrE; // dont accept a blaclc pixel on line beginning 
do{ 

al - find nextaaO color, a0+i, curr, length); 
a2 - find^nexttaOjcoior, al +1, curr, length); 

If (ao - - -1) 

bl « find nextaaO color, aO+l, prev, length); 
else if (prevtaOl - - aO color) 

bl - find nextaaO color, aO+l, prev, length); 
eise{ 

bi - find nextcao color, aO+i, prev, length); 
bi - find'nextoao color, bi +1, prev, length); 
} ■ 



b2 - find_next(aO_coior, bi +1, prev, length); 
// code It 

If (b2 < ai) { // PASS mode ^ ^ ^« 

//prlntfTPASS (a0-%d, ai -%d, a2-%d, b1 -%d, b2«%d)\n-, aO, ai, a2, bi, b2); 

code 1(MR CONTROL, PASS); 
a0-"b2; 
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) else if (abs(ai-di) < - 3) { //VERTICAL mode 
switch (ai-bl) ( 
caseO: 

//prlntf(-vo (aO»%a al -%a a2-%d, bl =%d, b2«%d)Vn-, ao, ai, a2. bl. b2); 
code_i(MR_corsrrROU VO); 
break; 
case 1: 

//printfrvRi caO-%d, ai -%d, a2»%d, bi -%d, b2-%d)\n-, aO, ai, a2, bi, b2); 

code KMR.CONTROUVRD; 

breaic; 
case-1: 

//prlntfrvu (aO-%d, ai -%d» a2«%d, bi -%d, b2=%d)\n\ aO, al, a2, bl, b2}; 

C0de_1(MR_C0NTO0U VL1); 
break; 
case 2: 

//prlntfrvR2 taO-%a al -%d, a2«%d, bi «%d, b2«%dnn-, aO, ai, a2, bl, b2); 

COdeJ(MR_CONTR0U VR2); 

break- 
case -2: 

//prtntfrvL2 taO-%d, ai -%d, a2«%a bl -%d, b2=%d)\n\ aa ai, a2, bi, b2); 

code 1(MR CON7ROUVL2); 

break; 
case 3: 

//printfrvR3 taO-%d, ai -%d, a2«%a bl -%a b2=%d)\n-. aO, ai, a2. bl, b2); 

C0de_1 (MR_CONTR0L. VR3): 

break; 
case *3: 

//printfrvLS (aO-%d, al -%d, a2 «%d, bl -%d, b2 =%d)\n-, aO, al, a2, b1, b2); 

C0de_1(MR C0NTR0UVL3); 

break; 

} 

aO - al; 
} else { // HORIZO^r^AL mode 
if (ao - - -1) 
ao - 0; 

//prlntfrHORlzONTAL: COLOR - %d, LEN1 - %d, LEN2 - %d (a0-%d)\n\ ao.coior, ai-ao, a2 
M M s > aO' 

COdell(MR_CONTROU HOR); 
put_rt(al-aa aO color); 
put~ri(a2-ai, laO color); 
aO - a2; 

} 

If (ao < length) 
ao color - currtaOl; 
} while (ao < length); 
//prtntfTEOLVn-); 

//put EOLaine); /* we dont need it because next aO Is beyond line */ 

) 

r converts a single run-length (unlimited length) to several runs 

according to MR (croup3»4) spec */ 
void put riant lea Int color) 
{ 

if Qen > 63) ( 
put code(aen/64) + 63. color); 
len— den/ 64) '64; 
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put codeuen. color); 

} 

/• codes one legitimate run V 
void put_code(int len, Int colon 

^ code 1 (color, BW SYMBOLS - len - 1); 

) 

/* we do not need tnis if we know tne line length in advance •/ 
void put_EOL0 

^ //code i(WHrrE,EOU; 

//COde'KBLAOC EOU; 

} 

/• finds the next color interchange V 
find_next(int color, int pos, char *llne, Int ten) 

{ 

Inti; 

Char 'ptr; 

if (pos > ien-1) 
retumaen); 

if ( (ptr - mennchrdlne+pos, color, len-pos)) - « NULL) 

return len; 
else 

return (ptr-line); 

} 

Bigerrant n, char *s) // too many bits in strip. 
{ 

prtntfrerr %d - %s", a s); 
exItO); 

} 

/* codes 1 symbol (Control or Black Run or White Run) */ 

code 1 Qnt mode, int d 

{ 

SYMBOLS; 

convertjnt_to_symbol( c, &s, mode); 
encode "symbciic &s ); 
update'modeKc); 

} 

/* to finish wfth the arithmetic coding: •/ 

code EOFO 

{ 

SYMBOLS; 

convert int to symboK EOF, &s, MR.CONTROU; 
encode symboK &s ); 

} 
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/• oesc: Header file mainly foragcmp.c. agexp.c */ 
/* AUTHOR: Arik Gordon */ 



/• This is a header that appears at the beglning of the compressed file •/ 
tvpedef struct AG_HEADER { 

long total Pytes; 

long number of lines In file; 

}AC HEADER; 



/* in our implementalon we assume a standard fax document with 1728 pixels 

per line */ 
#define PELS PER_UNE 1728 
Idefine BYTES PER_UNE 216 



/define strip_si2E 100 

/define WHITE 0 
/define BLAOCI 
/define MR_CONn?OL 2 

/define MR SYMBOLS 9 
/define BW^SYMBOLS 93 

/define VO 8 
/define PASS 2 
/define VL1 3 
/define VR1 4 
/define HOR 5 
/define Vl^ 6 
/define VL3 7 
/define VR2 1 
/define VR3 0 



/define Peepo putchCT) 
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/* 

* Listing 9 - amdi.c 
* 

* AUTHOR: Originally from Dr. Dobbs, Feb 1991, Substantially modified 

by Arik cordon. 

« 

* This Is the statistical estimation module for compressing 

* MR codes. There are three types ot codes: MR coisrrROi., BLACK Run-Length 

* and WHPTE Run-Length. For each type we havi a seperate statistical 

* estimator of order 0 for run-lengths and order 2 for MR^CONTROL 
* 

* This Is a relatively simple model. For each symbol type, 

* the totals for all of the symbols are stored In an corresponding 

* array (e.g. "mr storage"). This array has valid indices from -1 

* to NI. The reason for having a -1 element is because the EOF 

* symbols is included In the table, and it has a value of -1. 

* (Ni -» number of different symbol for each type) 
* 

* The total count for all the symbols is stored in totatsusill, and 

* the low and high counts for symbol c are found in "arrayid and 

* arrayic+u. 
*/ 



#]nctude <stdio.h> 
llnclude <stdllb.h> 
llnctude <malloch> 
llnctude <io.h> 
linclude <ermo.h> 
#include <fcntl.h> 
#lnclude <sys\types.h> 
^include <sys\stat.h> 



#inciude "ACCMp.h* 
#include "acoder.h" 
#inciude "amodel.h" 



* in order to create an array with Indices -1 through num of symbols, l have 

* to do this funny declaration. totalsMl - - storagetOl. 
V 

short Int **mr_storage; 
short Int *wt storage; 
short Int *bl storage; 
Short Int ♦totals; 

static Int num_of_symbois, maxlmum_scaie; 
static Int prev, prevl; 

* When the model Is first started up, each symbols has a count of 

* 1, which means a low value of c+1, and a high value of c-i-2. 
V 

void Initialize modeio 
{ 

int I, j, order_2_symbols; 



prev - prevl - O; 
num_of_symbols - MR^SYMBOLS; 
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order 2 symbols « num of_svmbois * num of symbols; 
mr storage - (int •*) maTloc(sizeof(int •) * (order_2_symbois+l)); 
for"(i«»0; i< order 2 symbols; ! + +) 
mr_storagetlJ = mailoc(sizeof(lnt3 * (num_of_symbois+2)); 

for (|-G;j<order 2 symbols; { 
totals - &(mr_stbragetiniD; 
for (I » -1 ; I < « num of symbols ; 1+ + ) 
totalslll -1 + 1; 

} 

num_of_symbols - BW_SYMBOLS; 

wt storage « mailoc((num_of_symbols+2) * sizeofflnt)); 

totels - &(wt_storagel1D; 

for ( I « -1 ; I < - num of symbols ; 1+ + ) 
totalslll -1 + 1; 



bl_storage = malloc((num_of symbols +2) * sizeofOnt)); 
totals - &{bl_storagel1D; " 

for { I - -1 ; i < - num of symbols ; 1+ + ) 
totaism -1 + 1; 

} 

/• 

* Updating the model means incrementing every single count from 

* the high value for the symbol on up to the total. Then, there 

* is a complication, if the cumulative total has gone up to 

* the maximum value, we need to rescale. Fortunately, the rescaie 

* operation Is relatively rare. 
V 

void update_modei( Int symbol ) 
{ 

int i; 

for I symbol + + ; symbol < « num_of symbols; symbol + + ) 

totaisl symbol 1++; 
if ( totaisl num.of symbols i - - maximum scale ) 
{ 

for (i - 0 ; i < - num of symbols ; 1+ + ) 
{ 

totaisl n/- 2; 
if (totatslIK- totatsti-1 1) 
totatsU] - totaisl i-1 1 + l; 

) 

) 

r 

* Finding the low count high count and scale for a symbol 

* is really easy, because of the way the totals are stored. 

* This is the one redeeming feature of the data structure used 

* In this implementation. 
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Int convertjnt_to_svmboi( int c, symbol •s, Int mode ) 

^ switcn(mode) ( 
case WHITE: 

totsis - \A/t_storage + 1; 

num of^syrnbois - BW^SYMBOLS; 

maximunn_scale - 2048; 

break; 
case BLACK: 

totals - bl_storage + 1; 

num_Of_SYmbOlS - BW_SYMBOLS; 

maximuTn.scaie « 2048; 
break; 

case MR_CONTROL: 

num Of_SYmbOlS - MR SYMBOLS; 

totals mr_storageUprevl • num_of_SYmbois + prev)i + 1; 
previ « prev; 
prev « c; 

maximum^scale « 256; 
break; 

1 

s-> scale = totals! num_of_svnibols 1; 
s->low count - totalsfcl" 
s->hign_count - totaisic+l l; 
retum(O); 

) 

/• 

* Getting the scale for the current context is easY. 

void get symbol scale( SYMBOL *s, int mode, int prev, int previ) 

{ " " 

switch(mode) { 

case WHITE: 

totals - wt.storage + 1; 
num_of_svTTibois - bw.symbolS; 
maximum scale « 2048~ 
break; 
case blaoc: 
totals - bi storage + 1; 
num_of_svmbois - bw_symbols; 
maximum scale » 20487 
break; 

case MR CONTROL: 

num of symbols - MR SYMBOLS; 

totals -"mr.storageicprevl ♦ num.of_SYmbois + prev)l + 1; 

maxlmum.^le » 256; 

break; 

s->scaie - totalst num of symbols ]; 

1 

/• 

* During decompression, we have to search through the table until 

* we find the symbol that straddles tne -count" parameter. When 

* it is found. It Is returned. The reason for also setting the 
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• man count and low count is so that symbol can be properly removed 

• from tne arithmetic coded input. 

int convert.symboLtoJnt( Int count SYMBOL -s) 
{ 

int c; 

f or ( c - num_of_SYmbols-l; count < totals! c 1 ; c- ) 

s->hlgn count = totalst C+1 1; 
s->iowJcount » totaisi c 1; 
returnfc ); 

} 

/* The following is an optional module, that •"^^^"zes me stetirt^^ 
estimation tables with pre-defined values. It can slightly Improve 
compression of small files */ 

init_mr_modelO 

• inti; 

update initial mr modeUVO, 6); 
update'lnltiarmr modeU VLI, 2); 
updatelnitiafmr model(VR1,2); 
updatelnltiafmr modeK HOR. 2); 
update'inltlafmrlmodeK PASS, 1); 

1 

update initial mr modeK int symbol, Int count) 
inti, prev, previ, J; 

num of.symbois « mr.symbolS; 
maximum^scaie « 256; 

for (prev - 0; prev<num of.symbols; prev+ +) 
for (previ -» 0; previ <num of.symbols; previ + +> I 
totals - mr storageltprevi"* num.of.symbols + prev)i + i; 

fora-0;J<count;J + +) 
update modelcsymboD; 

} 

} 

free.amdLbufsO 

: ^ Irtt I order_2.symbois; - 

num of symbols - MR.SYMBOLS; ^«„h«ie- 
orSr 2 symbols - num.of.symbols * num.of.symbols; 
for a - oTl < order_2_symbols; I + +) 

free(mr_storageUD; 
free(mr.storage): 

num of symbols - bw_symbolS; 
free(wtrstorage); 
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free(dl storage); 
// neapmInO; 

} ■ 
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/♦ 

* Listing 8 - amodel.h 
• 

* mis file contains all of tne function prototypes and 

* external variable declarations needed to interface with 

* the modeling code found in amdi.c. 
•/ 

/• 

* Eternal variable declarations. 
♦/ 

extern Int max order; 
extern Int f lushing.enabled; 
/* 

* Prototypes for routines tnat can be called from MODEL-X.C 
•/ 

void initialize modeK void ); 
void update modeK Int symbol ); 

int convert Int to symboU int symbol, SYMBOL *s, int mode ); 
void get syfhboLscaiec SYMBOL •s, Int mode, Int prev, int previ ); 
int convert symboLtoJntt int count SYMBOL *s ); 
void add character JtoJmodeK Int c ); 
void flush modeK void); 
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/• 

• Listing 2 - coder.c 

• SOURCE: Dr. Dobbs Journal Feb 1991 + minor modifications bv 

Ariic cordon 

* 

♦ This file contains tne code needed to accomplish arithmetic 

* coding of a symboL All the routines in this module need 

• to know in order to accomplish coding is what the probabilities 

♦ and scales of the symbol counts are. This information is 

• generally passed in a symbol structure. 
« 

♦ This code was first published by lan H. witten, Radford M. Neal 

* and John G. cieary In "Communications of tne ACM* in June 1987, 

* and has been modified slightly, 
V 



#include <stdio.h> 
linclude "acoder.h* 
#include -abltio.h- 
#lnciude -AGCMP.H* 



♦ These four variables define the current state of the arithmetic 

* coder/decoder. They are assumed to be 16 bits long. Note that 

• by declaring them as short Ints, they will actually be 16 bits 

* on most 80X86 and 680X0 machines, as well as VAXen. 
♦/ 



static unsigned short tnt code; /* The present input code value / 

static unsigned short int low; /♦ Start of the current code range / 

static unsigned short Int high; /• End of tne current code range */ 

long underfiow_bits; /* Number of underflow bits pending / 

/* 

♦ This routine must be called to Initialize the encoding process. 

* The high register is Initialized to ail is, and It is assumed that 

* It has an infinite string of is to be shifted Into the lower bit 

* positions when needed. 
*/ 

void lnltlallze_arlthmetlc_encodero 

^ low - 0; 
high - Qxffff; 
underflow bits - 0; 

} 

/* 

* This routine Is called to encode a symbol. The symbol is passed 

♦ In the SYMBOL structure as a low count, a high count, and a range, 

• instead of the more conventional probability ranges. The encoding 

• process takes two steps. First, the values of high and low are 

• updated to take Into account the range restriction created by tne 

• new symbol. Then, as many bits as possible are shifted out to 

• the output stream. Finally, high and low are stable again and 

* the routine returns. 
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void fastcaii encode jymboK symbol *s ) 
{ 

long range; 

♦ These three lines rescaie high and low for the new symbol. 
V 

range » (long) ( high-low ) + 1; 
high « low + (unsigned short Int ) 

(( range • s-> high_count ) / s-> scale - 1 ); 
low « low + (unsigned short Int) 

(( range * s->low_count ) / s-> scale ); 

• This loop turns out new bits until high and low are far enough 

• apart to have stabilized. 
*/ 

f or ( ; ; ) 
( 

/• 

* If this test passes, it means that the MSDiglts match, and can 

* be sent to the output stream. 
♦/ 

If ((high & 0x8000) (low aoxsooo)) 

output blt(high & 0x8000); 
while ( underfiow_blts > 0) 

oirtput blt(-high & 0x8000); 
underflow bits-; 

} 

} 

If this test passes, the numbers are in danger of underflow, because 

• the MSOigits don't match, and the 2nd digits are Just one apart. 
*/ 

else If ( ( low & Oxaooo ) && l( high & 0x4000 )) 

^ underflow bits +- 1; 
low &» Oxffff; 
high I - 0x4000; 

} 

else 
return; 
low << - 1; 
high<<-l; 
high I - 1; 

} 

> 

At the end of the encoding process, there arestlll slgnlflrant 

• bits left in the high and low registers, we output two bits, 

• plus as many underflow bits as are necessary. 
*/ 

void flush arithmetlc_encoder() 
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output bitdow & 0x4000 ); 
underflow lDits + +; 
while ( uncTerfiow^blts- > 0) 
output bit( - low & 0x4000); 

) 

/• 

* When decoding, this routine is called to figure out which symbol 

* is presently waiting to be decoded. This routine expects to get 

* the current model scale in the s-> scale parameter, and it returns 

* a count that corresponds to the present floating point code: 
* 

* code - count /s-> scale 
♦/ 

int get current countc symbol •s ) 
{ 

long range; 
snort Int count 



range - (long) ( high - low ) + 1; 
count « (Short Int) 

((((long) ( code - low ) + 1 ) • s->scale-l ) / range ); 
retum( count ); 

} 

/♦ 

• This routine is called to Initialize the state of the arithmetic 

• decoder. This involves initializing the high and low registers 

♦ to their conventional starting values, plus reading the first 

* 1 6 bits from the input stream into the code value. 
•/ 

void Initialize arithmetic decodero 
{ 

int 1; 

code - 0; 

ford - 0;l < 16;i + +) 
{ 

code <<- 1; 
code « input bltO; 

} 

low - 0; 
high « Oxfffr, 

) 



• Just figuring out what the present symbol is doesnt remove 

* it from the input bit stream. After the character has been 

• decoded, this routine has to be called to remove It from the 

♦ input stream. 
V 

void remove symbol from stream(SYMBOL *s ) 
{ 

long range; 
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^* First tne range is expanded to account for tne symbol removal. 
V 

range - (longX high - low ) + 1; 
high - low + (unsigned short Int) 

(( range * s->hlgh_count ) / s-> scale - 1 ); 
low « low + (unsigned short tno 

(( range * s-> low_count ) / s-> scale ); 

/* " ^ 

• Next any possible bits are shipped out. 

*/ 

for(;;) 
{ 

/• 

• If the MSDigits match, the bits will be shifted ouL 
•/ 

If ( ( high & 0x8000) - - ( low & 0x8000) ) 

{ 
} 

/♦ 

• Else, If underflow Is tnreatining, shift out the 2nd MSDlglt 
«/ 

else If (aow & oxaooo) - - oxaooo && (high & 0x4000) » - o ) 
^ code"« 0x4000; 

low &-0x3fff; 
high 1=0x4000; 

} 

/♦ 

• Otherwise, nothing can be shifted out, so I return. 
*/ 

else 
return; 

low << - 1; 
high <<« 1; 
high I - 1; 
code <<- 1; 
code - Input bItO; 

} 

} 
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/* 

* Listing 1 - acoder.n 
• 

♦ This header file contains the constants, declarations, and 

• prototypes needed to use the arithmetic coding routines. These 

* declarations are for routines that need to Interface with the 

• arithmetic coding stuff in acoder.c 
♦ 

•/ 

#define MAXIMUM SCALE 2048 // 16383 /* Maximum allowed frequency count V 
#define ESCAPE " 256 /• The escape symbol */ 
#def ine done -i /* The output stream empty symbol */ 
#define FLUSH -2 /* The symbol to flush the mode! ♦/ 



* A symbol can either be represented as an int, or as a pair of 

* counts on a scale. This structure gives a standard way of 

• defining It as a pair of counts. 
•/ 

typedef struct { 

unsigned short Int low_count* 
unsigned short int high^count;. 
unsigned snort Int scale; 

} SYMBOL; 

extern long underflow bits; /• The present underflow count In •/ 
/♦ the'aritnmetic coder. */ 

/* 

• Function prototypes. 
V 

void initialize arithmetic decoderO; 

void remove symbol from stream( SYMBOL 's ); 

void initialize arltnmitic_encoder( void ); 

void encode_symboi( SYMBOL *s ); 

void flush arithmetic_encoderO; 

int get current_count( SYMBOL •s); 
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/• 

* Listing 4 - abitio.c 

« 
« 

* SOURCE: Dr. DobDs Journal, Feb 1991 + minor modifications by 

Arik Gordon 

* This routine contains a set of bit oriented i/o routines 

* used for arithmetic data compression. The important fact to 

* know about these is that the first bit is stored in the msb of 

* the first byte of the output like vou might expecL 
« 

* Both input and output maintain a local buffer so that they only 

* have to do block reads and writes. This is done In spite of the 

* fact that c standard I/O does tne same thing, if these 

* routines are ever ported to assembly language the buffering 

* will come In handy. 

V 

#inciude <stdio.h> 
include <stdlib.h> 
linciude "acoderh" 
llnclude "abitio.h* 

/include •ACCMP.H" 

/define BUFFER.SIZE 81 92 

static Char ♦buffer; /* This is the i/o buffer V 

static Char •current.byte; /* Pointer to current byte */ 

static Int output mask; /• During output this byte V 

/• contains the mask that Is •/ 
/♦ applied to the output byteV 
/♦ if the output bit is a 1 */ 

static int input_bytesjeft /* During input these three ♦/ 
static int input bitsjeft /* variables keep track of my*/ 
static Int past eof; /* input state. The past_eof */ 

/• byte comes about because •/ 

r of the fact that there is */ 
static long total bytes; /* a possibility the decoder */ 

/♦ can legitimately ask for */ 

/* more bits even after the */ 

/* entire file has been */ 

/• sucked dry. */ 

static RLE •stream; 



* This routine Is called once to initlaize the output bltstream. 

* Alllt has to do Is set up the current_bvte pointer, clear out 

* all the bits in my current output byte, and set the output mask 

* so It will set the proper bit next time a bit is output 

void mmallze.output.bltstreamccnar *f iie, void •header, unsigned int headerjize) 
^ buffer - mallOC(BUFFER.SlZE+2); 
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if (DUffer = - NULU { 

printfr\niobs:no mem\n*); 
exitO); 

total bytes - OU 

current.bvte « buffer; 

•current bvte - 0; 

output mask « 0x80; 

stream"" fooencfiie, "wb"); 

setvbuf( stream, NULL, iofbf, 8192 ); 

total bytes + - fwritemeader 1, neader.size. stream); 

//priritf rtotai bytes » %ld\n\ totaLbytes); 

} 

/• 

* me output bit routine just nas to set a bit in the current byte 

* if requested to. After tnat it updates the mask. If the mask 

* sho>A/s tnat the current bvte is filled up, it is time to go to the 

* next character in the buffer. If the next character Is past the 

* end of the buffer, it Is time to flush the buffer. 
*/ 

void output bitant bit) 
{ 

If (bit) 

•current.byte | = output.mask; 
output_maik > > - 1; 
if ( output.mask - - 0 ) 

^ output^mask - 0x80; 
current byte++; 

if ( current^byte - - ( buffer + BUFFER.SIZE ) ) 

^ totai_bytes + - f>Anlte( buffer, 1, buffer^size, stream ); 
current_byte - buffer; 

•current byte - 0; 

} 

} 

/• 

♦ When the encoding is done, there wilt still be a lot of bits and 

• bytes sitting in the buffer waiting to be sent out. This routine 

* Is called to dean things up at that point. ^ 

long fiush_output_bltstreamcvold •header, unsigned Int header.size) 

^ totaf bytes + - fwrlte( buffer, 1, (size.t)( current.byte - buffer ) + 1, stream ); 
current byte - buffen 
fseekOTeam, ou seekset): . . „ 

memcpy (header, atotaLbytes, slzeofoong)); 
fwrlte(header, header.size, 1, stream); 
fdosecstream); 
free(buffert; 
heapminO; 
retumttotal byte^; 
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• Bit oriented input is set up so that tne next time the nput bit 

• routine is called, it will trigger tne read of a new bloclc. mat 

• is wny input_bitsjef t is set to 0. 

void initlalizejnput.bitstreamtcnar •file, void ♦header, unsigned Int neader.size) 

^ buffer « mailoctBUFFER size +2); 
if (buffer - = NULU { 
printfnniibsino mem\n"); 
exlt(9); 

inout_bitsjeft - 0; 

input_bvtesjeft - 1; 

past eof - 0; 

stream - f open(fiie, "rb"); 

setvbuf( stream, NULL, JOFBF, 8192 ); 

freadmeader, 1, header size, stream); 

} 

ciosejnput^bitstreamo 

^ free(buffer>; 
heapminO; 
fciose(stream); 

} 

This routine reads bits In from a file. The bits are all sitting 

• In a buffer, and tils code pulls them out. one at a time, wnen me 

• buffer has been emptied, that triggers a new file read, and me 

• pointers are reset This routine is set up to allow fortwo ^^^l^ 

* bytes to be read In after the end of file is reached. This is tiecatse 

* we have to keep feeding bits into the pipeline to be decoded so that 

* the old stuff that is 1 6 bits upstream can be pushed out. 
•/ 

lntlnput_bltO 

^ if ( input_bltsjef t - - 0 ) 

^ current bvte++; 
input bytes left-; 
mput'bits left - 8; 
if ( input.bytesjef t - - 0 ) 

^ input bytes left - fread( buffer, 1, eUFFER.SiZE, stream ); 
If ( lnput_bvtes jef t - - 0 ) 

^ ffcpast.eof) 

^ f printf( stderr, 'Bad input flieXn* ); 
exlt(-1); 

} 

else 
{ 
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past eof - 1: 
Input bvtesjeft - Z; 

1 ' 

current byte - buffer; 

> 
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/* 

• Ustlngs-abltlo.n 
• 

• This header file contains the function prototypes needed to use 

• the bltstream i/o routines. 
* 

♦/ 

int input bitO; 

void initialize output bltstreamtchar *file. void 'header, unsigned int header.size); 
long flush output bitetreamcvoid • header, unsigned int header_si2e); 
void output bitant bit); 

void inlttaitze Input bitstreamcchar *file, void 'header, unsigned int header_slze); 
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ACEXP DECOMPRESSION UTIUTY 



The agexp program decompresses files created by agcmp to 
a binary ratsterized file (no headers) on tne disk. 

(File size Is fixed and determined in agcmp.M) 

FILES: 



agexp.c - tne main loop for decompression. Retrieves MR codes 
from the arithmetic coder and re-generates the raw 
binary file. 



The following sources are common to both programs - agcmp and 
agexp (Decompression) and handle the statistical estimation 
(element frequency accumulation) and the arithmetic coding: 

amdl.c - statistical estimation. Based on a source from Dr. Dobbs 
Journal, February 1991, "Arithmatlc Coding and Statistical 
Modeling" by Marie R. Netsoa but modified to fit compression 
of MR codes. 

acoder.c, abitio.c • implement the arithmetic coder, based on Dr. Dobbs 
Journal. 

COMPILATION: 



agexp: cc agexp.c amdi.c acoder.c abitio.c 
FURTHER INFORMATION about agexp.c. 



AUTHOR: Arik Gordon 
INPUT: compressed file. 

OUPUT: A rastered file (No headersD with 1728 binary pixels per line 

USAGE: agexp compressed RLE NAME RASTER.FILE.NAME 

Desc : This Is the main loop'for agexp utility, it makes calls to the 
arithmetic coder to retrieve the MR codes, and than builds a 
ratered binary image. 

* .•♦••♦••♦.♦•♦.•*•♦../ 



/include <stdIo.h> 
/Include <stdllb.h> 
/Include < string Ji> 
/Include <fcntl.h> 
/include <memorv.h> 
/Include <manoch> 
/Include <sys\tvpes.h> 
/Include <sys\statJi> 
///Include <dos.h> 
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#include -acoder.h" 
#!nciude •amodei.M" 
#lnclude •abltlo.n- 
^include -agcmp.h" 

find nextdnt color, int PCS, cnar •line, Int len); 

Int uhcompress_strip_andjave(unsigned cnar •compressed.long compressedjizejnt fdo); 
mr_uncompress(unsigned"char *llne, unsigned cnar •prev); 
void huf uncompress(unsigned char *line,unsigned cnar *stn; 
void get>lt_stream(cnar •str,char ♦buflJong compressed.size); 
Int pacic8(unsigned cnar *ilne,unsigned cnar *buf); 
int find PKint aO colorjnt aO,cnar •iinejnt lengtn); 

void vertlcaLcodednt •aO_coior,int -aCcnar *prev,lnt lengtn,cnar *currjnt offset); 
int find_hufjen(lnt •aO^colort; 



#deflne STRIP_SI2E 100 // can be any number, determines buffer size 



malnant argc, cnar *argvO) 
{ 

if (argc I - 3) { 

f printfcstdenr, -\nusage: %s C3 output file name IMC f!le_name \n", argvIOD; 
exit(9); 

) 

agexp(argviii, argvt2D; 



agexptenar •Infiie, cnar ♦outflle) 

^ cnariinelPELS PER UNEi,prev ilnetPELS per UNEl, •bufo; 
int fdo; 

unsigned cnar *compressed; 
long compressed size; // in bits 
intj«0, line num"- 0; 
AC HEADER ag header; 



If ((fdo-open(outfiie, o wronly | o great | o trunc i o binary, s iread i s iwrptb) < i) 
BigETTO, -AGEXP: cant open outflle^; 

If ( (bufo - maiioccsTRiP S!ZE*bytes_per.unb) - - nuu) 
BlgErrt9» "ACEXP: no mem"); 

memsettprevjine, 0, pels^per.unb; 

initialize modelO; 
Init mr_modelO; 

inltiailzejnput bltstreamanfile, &ag.neader, sizeof (ag_neaden); 

Initlalize'arltnmetic decoderO; 

Inlt.get^O; 

printfrUNES: %ld TOTAL; %ld\n-, ag.headernumber^of Jlnesjn^fiie, ag.neader.totaLbytes); 

while ( mr uncompressQine, prev line) ! - -1 ) { 
if (line num++ %100 0) 
printfrilne %d\r, line num-1); 
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memcpvtorev line. line. PELS PER LiNe; 
packSUIne, bufo+j*BYTES_PER.UNB; 
J + +; 

if (j - - STRIP_SIZe { 

wrlteCfdO. bUfO. BYTES PER_LINE ♦STRIP SI2B; 
J -0; 

} 
} 

if 0 I- 0) 

write(fdO, bUfO, BYTES_PER_LINE*J); 



free_amdl_bufsO; 
ciose^getJlO; 
freecbufoF; 
closecfdo); 

} 



uiiiinuiiimimiuiiuinnuuiiiiuniiifimiiu^ 

/*• This loop decompresses one rasterized line I ••/ 

mr.uncompress(unsigned cnar •line, unsigned char ♦prev) 

^ Int ao color - WHITH, b1, b2; 
Int aO - 0, MaOal, Maia2, code; 



lineiOl - WHTFE; // force a white pixel on line beginning 

While (ao < PELS PER_LINE) {//While not EOL 
code - get_1(MR CONTROU: 
if (code - - EOR 

retum(-l); 
switch (code) { 
casevo: 

verdcaLcode{&ao_coior. &a0, prev, pels.per_une, line, Q; 
breaic; 
case VR1 z 

verticaLcode(&aO_coior, &aO, prev, pels_per_une, line, 1); 
brealc; " 
case VLii 

vertical code(&a0 color, &aO, prev, pels_PER_une, line. 
brealc; " 
case HOR: 

MaOal - find huf len(&aO color); // qpos is gloablly known 
Maia2 - flnd"huf"len(&aO~coion; 
memsetaine-haO, ao color, MaOai); 
memsetaine+aO+MaOal, laO.coior, Maia2); 
aO + - (Ma0ai -i-Maia2); 
break; 
case PASS: 

bi - find bica0 color, aO, prev, pels.per.unD; 

b2 - flnd"next(a0.color, bi +1, prev, pels_PER_unb; 

memsetaine-i-aO,iO color, b2-a0); 

aO - b2; 

break; 
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caseVR2: 

bSkf PSI^-PER-UNE, line. 2); 

casevL2': 

bS?'-"''®'*^°-"'°'"' PELS_PER_UNE, line. -2); 

case VR3: 

veraai.codec&ao.coior. iao. prev, pel5_per_une, line. 3); 
casevLs'r 

vepHai_code(&aO_coior. &ao. prev, pels.per.line, line. -3); 

) 
} 

returnd); 

} 



flnd.biflnt ao.color. Int aO. char Mine, fnt length) 
lntbi; 

If OineiaO] - « ao colon 

eteeV ''"**-"e«OaO_coicr. ao+i. line, length); 

■ 2"5-"®*'30-Coior. aO+1, line, length)- 
^ bl -flnd_nextaao_color. bi+uine.lenS; 

^ retumQjD; 



/• Builds partial rasterlzed line according to MR eodee •/ 

v=« v,rB=«.c=««™ .ao.co,cr, « .30' SaT-SSSTlni »n,m c«3r -curr. ,™ ««eo 

intai, D1; 

ai - bi + Offset; ' ' 

-=0, •ao.co,or.al..ao,3l. -ao,: 

•ao.color « !cao colon; 
•ao-ai; 

} 

find.nuf jenflnt •ao.coloo 
Intlen; 

len - BW_SYMBOLS.i.get_icao_coion; 
If aen > 63) 
len - aen -63) • 64; 

If Qen < 64) { 
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*aO color - !*aO color; 
returnden); 
} else 

returnden + find huf len(aO color)); 

} 

find nexttint color, Int pos, cnar *llne, int len) 
{ 

int I; 

char •ptr; 

If (pos > ien-1) 
returnden); 

if ((ptr - memchrdlne+pos, color, len-pos)) NULU 

return ten; 
else 

return (ptr-llne); 

} 

BigErrdnt n, cnar ^s) // too many bits In strip. 
{ 

prlntfTErr %d - %s", a S); 
exit(9); 

} 



static int *count orev, previ; 

/*•♦* Arttnmetic decoder staff / 

init get 10 
{ 

count - malioc(si2eofant3 ♦ 3); // mr + b&w 
memsettcount 0, slzeofflnt) • 3); 
prev - previ O; 

} 

/• • • * Arithmetic decoder staff •••♦♦/ 
close get 10 

{ " " 
free(count); 

Close Input bltstreamo; 

) 

/♦♦♦♦♦ gets one symbol from the arithmetic coder */ 
get 1 ant mode) 
{ ■ 

SYMBOLS; 

Intc; 

get.symboi_scale( &s, mode, prev, previ ); 

counnmode) • get current countc &s ); 

c - convert_symboT.tojnt( countimodel, &s ); 

if (mode - - MR CONTROU { 
previ - prev; 
prev - c; 



wo 96/12245 



PCT/US9S/13296 



/;3 



C:\ARIIC\C0MPRESS\PTNT5RC\ACEXP.C - Sun Aug 28 07:04:42 1 994 



remove_SYnnt)ol_from_stream( &s ); 

if (C!- EOF) 

update.modeu c ); 
retumco; 

} 



static pack^bvtes, byte; 

/♦** routines for paclcing bytes to bits (for output) *•*/ 
pack8(unslgned char *iine, unsigned cnar *buf) 
{ 

int 1-0, J. k, color, new^pos, pos-0, n, bits, count-O; 

pack bytes - O; 
byte - 0; 
color - iine[03; 

while ((new^pos - find nextacolor, pos, line, PELS PER UNB) I- PELS PER UND { 
pack_n_bits(coior, new pos - pos, acount buf); 
pos « new pos; 
color « icoTor; 

} 

pack_n bitstoolor, PELS PER LINE - pos, acount, bufl; 
} " " 

pack^n^bitsdnt color, int n, int *count cnar •buf) 
{ 

Int bits; 

static b.tableo - {0,1,3,7,15,31,63,127,255); 

while ( (*count+ n) > 8) { 
If ccount!- o { 

bits - 8 • ♦count- 
byte - cbyte < < bits); 

If (colon 
byte ((color << blts)-l); 

buftpack bytes) - byte; 

pack bytes -I- -h; 

n— bits; 

•count - 0; 
}else{ 

If (colon 
byte - 255; 

efse 

byte - 0; 

buflpack bytes) - byte; 
pack bytes -I- +; 
n— 87 

} 
) 

byte - (byte < < n); 
If (colon 
byte - b.tableln); 
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(*count)+«n; 

If (•count - - 8) { 
buflpack^bvtesi - byte; 
pack_bvtes++; 
•count - 0; 

} 



wo 96/12245 PCTAJS95/13296 




CLAIMS 

1 . A method for compressing binarized images 
comprising: 

receiving a binarized image and generating a 
first sequence of first code symbols representing the 
binarized image wherein at least one row of the image is 
represented in run-length encoded format; and 

encoding a portion of the first sequence of 
code symbols using a preliminary encoding scheme, thereby 
to provide a first portion of a second sequence of code 
symbols, and, while encoding, accumulating the frequency 
of at least some of the first code symbols thus far 
encoded and generating an additional portion of the 
second sequence using a modified version of the code 
scheme such that at least one subsequent code symbol in 
the first sequence with a large accxamulated frequency is 
encoded more compactly in the second portion than at 
least one subsequent code symbol in the first sequence 
with a small accumulated frequency. 

2. A method according to claim 1 wherein a modi- 
fied Huffman coding scheme is employed to generate the 
first sequence of first code symbols. 

3. A method for compressing binarized images 
comprising: 

receiving a binarized image and generating a 
first sequence of first code symbols representing the 
binarized image comprising a representation of one row of 
the binarized image and a representation of differences 
between at least one subsequent row and at least one 
previous row; and 

encoding a portion of the first sequence of 
code symbols using a preliminary encoding scheme, thereby 
to provide a first portion of a second sequence of code 
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symbols, and, while encoding, accumulating the frequency 
of at least some of the first code symbols thus far 
encoded and generating an additional portion of the 
second sequence using a modified version of the code 
scheme such that at least one subsequent code symbol in 
the first sequence with a large accumulated frequency is 
encoded more compactly in the second portion than at 
least one svibsequent code symbol in the first sequence 
with a small accumulated frequency. 

4. A method according to any of claims 1-3 
wherein the encoding scheme used to encode the first 
sequence of code symbols is continually modified such 
that code symbols in the first sequence with a large 
accumulated frequency are encoded more compactly in the 
second portion than subsequent code symbols in the first 
sequence with a small accumulated frequency. 

5. A method according to any of the preceding 
claims wherein a modif ied-read coding scheme is employed 
to generate the first sequence of first code symbols. 

6. A method according to any of the preceding 
claims 1 - 4 wherein a modified modif ied-read coding 
scheme is employed to generate the first sequence of 
first code symbols. 

7. A method according to any of the preceding 
claims and also comprising binarizing a discrete level 
image, thereby to provide the binarized image. 

8. A method according to any of the preceding 
claims 1-6 and also comprising binarizing a continuous 
level image, thereby to provide the binarized image. 

9. A method according to any of the preceding 
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claims wherein arithmetic coding is employed to translate 
the accximulated frequency of at least some of the first 
code symbols into second code symbols . 

10. Apparatus for compressing binarized images 
comprising: 

a run-length encoder operative to receive a 
binarized image and to generate a first sequence of first 
code symbols representing the binarized image wherein at 
least one row of the image is represented in run-length 
encoded format; and 

an adaptive encoder operative to encode a 
portion of the first sequence of code symbols using a 
preliminary encoding scheme^ thereby to provide a first 
portion of a second sequence of code symbols^ and, while 
encoding, to acciuaulate the frequency of at least some of 
the first code symbols thus far encoded and to generate 
an additional portion of the second sequence using a 
modified version of the code scheme such that at least 
one subsequent code symbol in the first sequence with a 
large accumulated frequency is encoded more compactly in 
the second portion than at least one subsequent code 
symbol in the first sequence with a small accumulated 
frequency. 

11. Apparatus for compressing binarized images 
comprising : 

a binarized image compressor operative to 
receive a binarized image and to generate a first se- 
quence of first code symbols representing the binarized 
image, the first sequence comprising a representation of 
one row of the binarized image and a representation of 
differences between at least on subsequent row and at 
least one previous row; and 

an adaptive encoder operative to encode a 
portion of the first sequence of code symbols using a 
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preliminary encoding scheme, thereby to provide a first 
portion of a second sequence of code symbols, and, while 
encoding, to acctimulate the frequency of at least some of 
the first code symbols thus far encoded and to generate 
an additional portion of the second sequence using a 
modified version of the code scheme such that at least 
one sxibsequent code symbol in the first sequence with a 
large accumulated frequency is encoded more compactly in 
the second portion than at least one subsequent code 
symbol in the first sequence with a small accumulated 
frequency. 

12- Apparatus according to any of the preceding 

claims 10 - 11 wherein the binarized image compressor 
employs a modif ied-read coding scheme to generate the 
first sequence of first code symbols. 

13. Apparatus according to any of the preceding 
claims 10 - 11 wherein the binarized image compressor 
employs a modified modif ied-read coding scheme to gener- 
ate the first sequence of first code symbols. 

14. Apparatus according to any of the preceding 
claims 10 - 13 wherein the adaptive encoder employs 
arithmetic coding to translate the accumulated frequency 
of at least some of the first code symbols into second 
code symbols. 

15. Apparatus according to any of claims 10 - 14 
wherein the encoding scheme used to encode the first 
sequence of code symbols is continually modified such 
that code symbols in the first sequence with a large 
acc\amulated frequency are encoded more compactly in the 
second portion than subsequent code symbols in the first 
sequence with a small accumulated frequency. 
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ALLOCATE TABLES: 
WHITE_RLr93l 
BLACK_RL[93J 

MR_C0NTR0L[10] 



220 



FOR 1=0 TO 92 
WHITE_RL[I] = BLACK_RL[I]=I 

FOR 1=0 TO 9 
MR_CONTROL[l] = l 



I 



230 



240 

^ 



GET MR CODE ELEMENT 8c MR CODE ELEMENT TYPE 



250 



SUPPLY THE FREQUENCY INTERVALS STORED IN THE 
APPROPRIATE TABLE TO ARITHMETIC CODER 90 
OR ARITHMETIC D ECODER 110 

UPDATE APPROPRIATE TABLE: 
INCREASE FREQUENCY OF CURRENT CODE ELEMENT 




FOR EACH TYPE t. 
REFRESH_STATISTICAL_TABLES: 

IF TABLE[NUM_OF_SYMBOLS]=N* 

FOR 1=0 TO NUM_OF_SYMBOLS 
TABLE[l]=TABLE[l]/2 
IF TABLE[ll<TABLE[i-l] 

NEXT I TABLE[i]-TABLE[i-l]+1 
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